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Time series regression studies have been widely used in environmen- 
tal epidemiology, notably in investigating the short-term associations 
between exposures such as air pollution, weather variables or pollen, 
and health outcomes such as mortality, myocardial infarction or dis- 
ease-specific hospital admissions. Typically, for both exposure and 
outcome, data are available at regular time intervals (e.g. daily pollu- 
tion levels and daily mortality counts ) and the aim is to explore short- 
term associations between them. In this article, we describe the gen- 
eral features of time series data, and we outline the analysis process, 
beginning with descriptive analysis, then focusing on issues in time 
series regression that differ from other regression methods: modelling 
short-term fluctuations in the presence of seasonal and long-term 
patterns, dealing with time varying confounding factors and model- 
ling delayed (Tagged') associations between exposure and outcome. 
We finish with advice on model checking and sensitivity analysis, and 
some common extensions to the basic model. 

Keywords Time series, environmental epidemiology, air pollution 



Introduction 

This article aims to introduce the reader to the meth- 
odological features and analytical issues involved in a 
study design commonly used in environmental epi- 
demiology: the time series regression study. The 
design is often used in studies attempting to quantify 
short-term associations of environmental exposures, 
such as air pollution, pollen, dust and weather vari- 
ables, with health outcomes. 1-3 We aim to provide the 
reader with an insight into some of the unique fea- 
tures and challenges involved in analysing time series 
data. It is hoped that 'consumers' of studies will gain 
insight into the methods and an understanding of 
specialist terminology used in this context, enabling 



more effective critical interpretation and appraisal of 
study reports; and also that epidemiologists who may 
be in a position to analyse time series datasets will 
find this a useful tutorial covering the key steps and 
important issues involved in actually carrying out a 
time series regression analysis. Our intention is to 
complement other articles which offer historical per- 
spectives, more mathematical developments of the 
modelling ideas than presented here, and which cover 
issues uniquely relevant to specific exposures such as 
ambient temperature and particulate matter 4-7 (fur- 
ther references are listed in the Supplementary appen- 
dix, available as Supplementary data at IJE online). It 
should be noted that though our focus is on time 
series regression, other tools for the analysis of time 



1187 



1188 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY 



series data exist. Time series data occur frequently in 
econometrics; some methods that are commonly used 
in that field aim to forecast movements in a single 
time series (e.g. a stock market price), and would be 
of limited interest to epidemiologists, but others could 
in principle be applied to epidemiological questions. 
An example is the Granger causality test, which 
aims to establish, via a hypothesis testing paradigm, 
whether movements in one time series are causally 
related to movements in another. We do not consider 
this or other methods more commonly associated 
econometrics further in this paper. 8 

Throughout, concepts and methods will be illu- 
strated through an example based on a real dataset, 
and Stata and R code to reproduce our analyses, along 
with the dataset itself, are available as a 
Supplementary Appendix at IJE online. 

Data features and introduction to 
worked example 

The illustrative example we will use is a time series 
regression analysis of a dataset from London. The data- 
set consists of a single observation for every day from 
1 January 2002 to 31 December 2006, and for each day 
there is a measure of (mean) ozone levels that day, 
and the total number of deaths that occurred in the 
city. The question to be addressed is 'Is there an asso- 
ciation between day-to-day variation in ozone levels 
and daily risk of death?', so the exposure of interest 
is ozone and the outcome is death. The dataset also 
contains daily measures of two potential confounders, 
temperature and relative humidity (confounding is dis- 
cussed later in the paper). The first 12 rows of data are 
shown in Table 1. Some features worth noting are: 

• Generally, a 'time series' is simply a sequence of 
data points recorded at regular time intervals. So 
in this dataset there are actually four time series 
(ozone, temperature, relative humidity and number 
of deaths), and the aim is to say something about 
if/how these are associated. 

• The main unit of analysis (represented by a row of 
data) is the day and not the individual person. This 
will be an important point when we come to con- 
sider what the potential confounders might be in 
our analysis. Note however that a time series re- 
gression study does not necessarily have to be at 
the daily level; annual, monthly, weekly, or even 
hourly time series data could be analysed using the 
same broad methodological principles. 

• The outcome is a count, which is common for time 
series regression studies. The denominator (the 
underlying population size) is not part of the data- 
set, which is not a concern because in these data 
we are usually interested in modelling variation in 
outcome from day to day or week to week, and 
population size is unlikely to change meaningfully 



Table 1 Example rows of time series data from the London 
dataset showing daily levels of environmental variables and 
daily number of deaths 
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over these timescales, so can be safely omitted 
from the analysis. 

Descriptive analysis 

The first step should be familiar to epidemiologists 
from all specialties: getting to know the data through 
simple plots and tables. Figure 1 shows scatter plots 
of both the exposure (ozone) and outcome (number 
of deaths ) over time for the entire study period; a plot 
of this type can quickly reveal high-level patterns in 
the data. Moving average plots can also be used to 
supplement raw scatter plots and draw out patterns — 
such plots effectively smooth out the raw data by 
averaging over a fixed number of adjacent raw data 
points. In this case, the raw plots show that both 
ozone levels and death counts seem to be dominated 
by annual seasonal patterns, with ozone highest in 
summer and lowest in winter, and the opposite pat- 
tern for deaths. Note that one would not generally 
infer from this that low ozone levels in winter are a 
'cause' of the higher mortality: systematic patterns 
over time are present in many time series, inducing 
correlations that are in most cases unlikely to repre- 
sent causal relationships. It is for this reason that our 
aim is to consider associations over relatively short 
timescales, which are more likely to represent real 
causal relationships. 6 

Other informative descriptive analyses might include 
summary statistics, a correlation matrix for the cov- 
ariates to be included in the model and an exploration 
of missing data. Our example dataset contains no 
missing values, but there are frequently missing ex- 
posure data that need to be handled in the initial data 
processing or in the analysis itself: dropping incom- 
plete records is a simple strategy but may introduce 
bias; employing rules, algorithms or models to impute 
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Figure 1 Raw plots showing outcome (deaths) and exposure (ozone) data over time (London data) 



the missing data (singly or multiply) are alternatives, 9 
but are not considered further here. 



Time series regression 

After carrying out some initial descriptive analyses, 
the next step is to begin to develop a regression 
model (see Analysis using Poisson regression) that 
will enable us to address our principal study question. 

Aim of regression analysis 

The main aim of regression is to investigate whether 
some of the short-term variation in the outcome can 
be explained by changes in the main exposure; in our 
example, whether day-to-day changes in the number 
of deaths are explained in part by changes in the 
levels of ozone in the air. A regression approach will 
also allow control for multiple potential confounding 
factors. 

Analysis using Poisson regression 

The outcome variable here is a count (the number of 
deaths each day). The usual regression method of 
choice for analysing count data is Poisson regression, 
but we need to bear in mind some of the unique 
features of time series data of this type: 

• In the raw data, long-term patterns including sea- 
sonality are likely to dominate the data (as in our 



example). As our interest is in short-term associ- 
ations, the aim is to remove (i.e. control for) these 
long-term patterns, and see whether the exposure 
of interest explains some of the remaining short- 
term variation. Possible strategies to control for 
long-term patterns are covered in detail in the 
next section. 

• An assumption of Poisson regression is violated in 
the raw data: observations are unlikely to be inde- 
pendent, with observations close in time likely to 
be more similar than those distant in time (in the 
London dataset, this is very clear from Figure 1). 
However, this 'autocorrelation' is usually not in- 
trinsic to the outcome series, but rather due to 
autocorrelation in the explanatory variables that 
are predictors of the outcome. After controlling 
for seasonality, long-term patterns, the exposure 
of interest and other explanatory variables, residual 
autocorrelation will tend to be much smaller than 
in the raw outcome data, and is usually not a 
major concern. Nevertheless, at the model checking 
stage it may be a good idea specifically to model 
any remaining autocorrelation and check that our 
conclusions do not change (see Model checking 
and sensitivity analysis). 

• The data tend to be 'overdispersed', meaning that 
the variance of the outcome counts is higher than 
predicted under a Poisson distribution (in which 
variance = mean), so it is necessary to apply a 
simple adjustment to obtain appropriate standard 
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errors in the model fitting (specifically, a scale par- 
ameter is applied estimated by the Pearson chi-s- 
quare statistic divided by the residual degrees of 



freedom 10 



Controlling for seasonality and 
long-term trends 

To reiterate, the research question to be addressed is 
whether short-term variation in the outcome is 
explained by the exposure of interest; i.e. in our 
example, whether day-to-day changes in mortality 
are related to daily ozone levels. But the raw outcome 
data are likely to be initially dominated by seasonal 
patterns and long-term trends (Box 1), so it is neces- 
sary to control for these patterns in the regression 
model in order to effectively separate them out from 
the short-term associations between exposure and 
outcome that we are interested in. There are a 
number of ways to achieve this, but what they have 
in common is that some function of time is fitted as 
part of the regression model. 

Option 1: Time stratified model (simple indicator 
variables) 

• A simple way of approximately modelling long- 
term patterns in the outcome data is to split the 
study period into intervals and estimate for each 
interval a different baseline mortality risk. In prac- 
tice, this means simply including an indicator vari- 
able for each time interval in the Poisson model. 
One possible choice of time interval for daily data 
is elapsed calendar month, such that in these data 
there are 12 X 5 = 60 strata. 

• Pros: easy to understand, and often captures main 
long-term patterns quite well. 

• Cons: potentially large number of model para- 
meters; implicitly assumes biologically implausible 
jumps in risk between adjacent time intervals. 

• Figure 2a illustrates the predicted numbers of 
deaths from such a calendar -month stratified 
model applied to the London data. 



Box 1 Studying short-term associations in the 
presence of longer-term variation 

Seasonal and long-term patterns in both the expo- 
sure and outcome data can dominate crude associa- 
tions, making the short-term associations of 
interest hard to detect. 

By explicitly controlling for long-term patterns, the 
association between the exposure variable(s) of 
interest and the short-term variation around these 
long-term patterns can be explored. 



Option 2: Periodic functions (Fourier terms) 

• Long-term patterns can be modelled more 
smoothly by fitting Fourier terms in the Poisson 
model. These are pairs of sine and cosine functions 
of time with an underlying period reflecting the 
full seasonal cycle (i.e. calendar year), and are par- 
ticularly suited to capturing very regular seasonal 
patterns. A single sine/cosine pair will model sea- 
sonal variation in the outcome as a regular wave 
with a single (equally spaced) peak and trough per 
calendar year (the actual position of the peak and 
trough are guided by the data). However, harmo- 
nics (extra sine/cosine pairs with shorter wave- 
lengths) can also be introduced which results in 
more flexible functions. 

• Pros: models long-term patterns smoothly, using 
relatively few parameters. 

• Cons: more mathematically complex than the time- 
stratified model; the modelled seasonal pattern is 
always forced to be the same from one year to the 
next, which may not reflect the data well (e.g. 
timing of winter peaks in deaths may vary). 
Fourier terms alone cannot capture long-term 
non-seasonal trends (this can be solved by adding 
a further function of calendar time). 



Wc 
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Figure 2 Three alternative ways of modelling long-term 
patterns in the data (seasonality and trends) 
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• Figure 2b illustrates model fit for the London data 
using 4 sine/cosine pairs (1 fundamental plus 3 
harmonics) to capture seasonality, plus a linear 
function of time to capture broader trends over 
time. 

Option 3: Flexible spline functions 

• The third option it to fit a spline function of time; 
this is essentially a number of different polynomial 
(most commonly cubic) curves that are joined 
smoothly end-to-end to cover the full period. To 
fit a spline function in practice, we first generate 
a set of basis variables which are functions of the 
main time variable, and then include these basis 
variables in the Poisson model. In generating the 
spline basis, it is necessary to decide how many 
knots (join-points) there should be, which governs 
how many end-to-end cubic curves will be used 
and therefore how flexible the curve will be: too 
few will fail to capture the main long-term patterns 
closely, whereas too many will result in a very 
'wobbly' function which may compete with the 
variable of interest to explain the short-term varia- 
tion of interest, widening confidence intervals of 
relative risk estimates. The flexibility of the spline 
function is sometimes framed in terms of number 
of degrees of freedom rather than number of knots, 
where more degrees of freedom corresponds to 
more knots, and both imply a more flexible 
function. 

• Pros: models long-term patterns smoothly; can cap- 
ture seasonal patterns in a way that is allowed to 
vary from one year to the next; and will also cap- 
ture long-term non- seasonal trends in the data. 

• Cons: more mathematically complex than the other 
methods (though functions to generate the spline 
basis are available in major statistical packages). 

• Figure 2c illustrates a spline function applied to the 
London data, using 34 knots [ = (number of calen- 
dar years x 7) - 1], a common choice for daily 
mortality data. Although there is no consensus on 
how many knots are optimal, 7 per year has been 
justified as a balance between providing adequate 
control for seasonality and other confounding by 
trends in time, while leaving sufficient information 
from which to estimate exposure effects). 11 

Residual variation around the long-term 
pattern 

If seasonality and long-term trends are controlled for 
using one of the above approaches, we will be left with 
residual variation in which the long-term patterns are 
no longer apparent (Figure 3). By adding the exposure 
of interest to this model, we can now tackle our main 
aim, which is to investigate whether the remaining 
short-term variation around the long-term pattern is 
in part explained by the exposure variable(s). 



CD O 
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Figure 3 Residual variation in daily deaths after 'removing' 
(i.e. modelling) season and long-term trend. Fitted values 
were from a spline model for season and long-term trend 
only (as illustrated in Fig 2c) 

Exposure-outcome associations and 
confounding 

In the London data, fitting a naive Poisson model for 
mortality, with ozone as the only explanatory variable 
and no adjustment for seasonality or long-term 
trends, suggests that each 10|ig/m 3 increase in 
ozone levels is associated with a mortality risk ratio 
of 0.991 (95% CI 0.987 to 0.994, P < 0.001), i.e. higher 
ozone is associated with lower mortality risk. But we 
know that at least part of this is likely to be explained 
by confounding by season. After adding adjustment 
for season and long-term trend to the model (using 
a flexible spline as in Option 3 above), the direction of 
the estimated effect reverses (RR per 10|ig/m 3 = 1.007, 
95% CI 1.003 to 1.010, p< 0.001; or equivalently, a 
0.7% [0.3-1.0] risk increase), suggesting that in the 
short term, higher ozone is associated with higher 
mortality risk. Small effect sizes such as this are rela- 
tively commonplace in environmental epidemiology, 
but small effects are often still of public health impor- 
tance if entire populations are exposed. 

Confounding by other time-varying factors 

In the analysis adjusted only for season and long- 
term trend, ozone appears to be positively associated 
with mortality. But could there be confounding by 
other factors? In general epidemiology, common con- 
founders include age, sex, body mass index, smoking 
status, drinking and so on, but these 'standard con- 
founders' do not apply to our data because at the 
population level, the distribution of such factors 
does not (or is unlikely to) change from day to day, 
and cannot be associated with fluctuations in envir- 
onmental exposures such as pollution levels. So what 
are the potential confounders in this kind of study? 
Recall that the units of analysis are the time intervals 
represented by single rows of data (in our case, days), 
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and not individuals. Therefore potential confounders 
should be variables that can change from day to 
day, and that are plausibly related to daily fluctua- 
tions in our exposure of interest (ozone), as well as 
the outcome (mortality). In this example, a clear 
candidate is ambient temperature, because tempera- 
ture varies from day to day, ozone levels are related 
to temperature (ozone tends to be higher on hotter 
days due to the involvement of sunshine in the gen- 
eration of ozone), and it is well established that tem- 
perature is associated with mortality risk in the short 
term. 3 

Adding current temperature to the model 
(allowing for expected non-linearity 4 ) does indeed 
move the estimated ozone-mortality association 
towards the null and the adjusted effect is no 
longer statistically significant (adjusted mortality 
risk ratio per 10|ig/m 3 is 1.003, 95% CI 0.999- 
1.006, P = 0.11). This suggests that the initially esti- 
mated positive association between current ozone 
level and mortality risk was largely explained by con- 
founding by temperature. 

Other potential confounders of the ozone -mortality 
association might include further meteorological para- 
meters such as relative humidity (included in the 
dataset), other pollutants and variables capturing 
holiday and day of the week (pollution levels are 
likely to be related to population-level travel beha- 
viours, which are likely to differ over holidays 
and at weekends, and it is highly plausible that cer- 
tain health risks also differ at such times for reasons 
unrelated to pollution), but these are not explored 
further here. 



Box 2 Delayed effects and 'lags' 

A simple analysis would relate the number of out- 
comes on a given day to the exposure levels on that 
day. But we often wish to explore whether there is 
any delayed association. 

By creating time- shifted copies of the exposure vari- 
able and including them in the model, we can 
explore the association between outcome today 
and exposure on previous days. 
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Allowing for delayed exposure 
effects 

In the London data, our modelling thus far has 
related mortality on a particular day with ozone 
level on the same day. But it is possible that there 
is a delayed (or 'lagged') association between expo- 
sure and outcome. Yesterday's ozone level may be a 
more important predictor of today's mortality risk 
than today's ozone level. Estimating the association 
between yesterday's ozone level and today's mortality 
risk (i.e. the 1-day lagged association) is simply a 
question of shifting the ozone series forward in time 
(i.e. down one row) and re -fitting the previous model 
(Box 2). Figure 4a shows how the estimated ozone- 
mortality association (adjusted for temperature) 
changes as we increase the lag time from 0 to 7 
days. There is evidence of an association between 
ozone and mortality when the lag time is between 1 
and 5 days. However, these different lag effects are 
not adjusted for each other; so far each lag has been 
fitted in the model one at a time. To address this, all 
the lagged variables (the 0- to 7-day shifted series) 
can be simultaneously entered in the model. This is 
known as a 'distributed lag model' and, applied to the 
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Figure 4 Modelling lagged (delayed) associations 
between exposure and outcome. Asterisk indicates that 
the constraint applied was that the lagged associations 
for days 1 and 2 were the same, and for days 3-7 were the 
same 



London data, results in the effect estimates displayed 
in Figure 4b. In comparison with the individual lag 
models, all the effect estimates for lag days 0 to 5 
inclusive have now moved towards the null, suggest- 
ing (as expected) that the estimated individual lag 
effects were confounded by each other. There remains 
evidence of independent ozone-mortality associations 
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at lag days 1 and 2, suggesting that mortality risk on 
the current day is positively associated with ozone 
levels on the previous 2 days (or, equivalently, current 
ozone is positively associated with mortality on the 
following 2 days). 

The disadvantage of the simple 'unconstrained' dis- 
tributed lag model is that the lag terms are likely to 
be highly correlated, and collinearity in the model can 
result in imprecise estimates (wide confidence inter- 
vals). It is possible to overcome this by imposing some 
constraint on the effect estimates for the different 
lags (a 'constrained distributed lag model' ). 
Figure 4c displays the results of imposing a simple 
constraint on the distributed lag model, namely that 
the effect estimates for days 1 and 2 are the same, 
and the effect estimates for days 3 to 7 inclusive are 
the same (a so-called 'lag- stratified' distributed 
lag model, 4 which might be justified by the broad 
patterns revealed in the unconstrained model of 
Figure 4b). Collinearity is now much reduced, fewer 
parameters need to be estimated, and associations at 
individual lags are estimated with greater precision, 
though a potential criticism of this approach is that 
the choice of constraints, if not pre- specified, could be 
argued to be too 'data-driven'. More complex con- 
straints including smooth (polynomial or spline) 
functions of lag time can be applied. 6 

The cumulative effect of an exposure over several 
lag days can be calculated from a distributed lag 
model as the sum of the coefficients. These estimates 
are often similar in constrained and unconstrained 
models (as in the London data, shown on the right 
hand side in Figure 4b and c), and the similar con- 
fidence interval widths for the cumulative effect esti- 
mates from the constrained and unconstrained 
models have been observed before. 4 

Potentially confounding time-varying factors may 
also have lagged effects, which can be modelled in 
the same way. 

Short-term displacement, or 'harvesting' 

Distributed lag models sometimes reveal an appar- 
ently odd feature: a raised risk ratio at short 
lags followed by an apparently protective effect 
at longer lags. For example, a study relating 
ambient temperature to hospital admissions for 
heart disease found that admissions increased on 
days with very high temperatures, but several days 
after the high temperature episode there were fewer 
admissions than expected. 12 This suggests that highly 
vulnerable people who were in any case within days 
of being admitted to hospital due to heart disease 
may have simply had their heart problem 
brought forward by a few days as a result of the 
high temperature episode. This is a phenomenon 
known as short-term displacement, or 'harvesting'. 
If harvesting appears to be present, the extent to 
which the short-term risk increase is 'cancelled out' 
by reductions in risk at longer lags can be ascertained 



by considering the cumulative association between 
exposure and outcome over the full lag period (esti- 
mated by summing the model coefficients, as 
described earlier). 



Model checking and sensitivity 
analysis 

Having developed one or more models, before pre- 
sentation it is essential to check residual plots and 
carry out sensitivity analyses in order to reveal any 
problems with the model assumptions, anomalies in 
the data, residual autocorrelation, or sensitivities of 
the main results to the decisions that have been 
made. 13 Useful diagnostic plots based on deviance 
residuals are described in the Supplementary 
Appendix (available as Supplementary data at 
IJE online), and other papers provide more 
detail. 6 ' 14 

In addition, since the modelling process that we 
have outlined involves many decisions, we would 
recommend carrying out multiple sensitivity analyses 
to check that the main conclusions are robust to 
changes in these decisions. Sensitivity analyses 
might include changing amount of control for sea- 
sonality and long-term trends in the model (e.g. by 
changing the number of knots in the spline -based 
approach, or harmonics in the Fourier terms 
approach); specifying exposure and confounder 
variables in different ways (e.g. in the London ana- 
lysis, we might try including relative humidity as a 
linear instead of categorical variable, or adjusting for 
maximum instead of mean daily temperature); chan- 
ging the way lagged effects are included in the 
model; and changing other key context- specific 
decisions. 



Precision and power considerations 

To our knowledge, there has been little formal 
development of power calculation methodology in 
this context, which may reflect the preponderance of 
studies using secondary or routinely collected data, 
where all available data are used. Nevertheless, a 
few broad points can be made. Factors determining 
the precision of a study include the length of the 
series (e.g. number of days) and the number of 
events (e.g. deaths) per day. Overdispersion (high 
variability in counts) can also reduce precision. For 
power, the size of effect that is plausible is also 
important. In the authors' experience, studies of pol- 
lution effects, one of the most common applications, 
need thousands of observation days with an average 
of tens of events per day, for credible precision and 
power. 
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Box 3 Summary of key considerations and steps in a time series regression study 

Explore data with simple plots and tabulations: 

• Plot of exposure variable(s) against time 

• Plot of outcome against time 

• Correlation matrix for exposure and outcome variables 

• Summary statistics for each variable 

• Summary of missing data in each variable. 

Methods to control for seasonality and long-term trends: 

• Indicator variables for time strata (time- stratified model) 

• Periodic functions of time (sine/cosine functions) 

• Flexible spline functions of time. 

Modelling the exposure-outcome association — immediate vs delayed effects: 

• Individual lag models considering different lags one at a time 

• Distributed lag models considering all lags in a single model (unconstrained, or constrained to reduce 
collinearity) 

• Consider possible non-linear associations as in other regression contexts. 
Model checking 

• Diagnostic plots based on deviance residuals (see web appendix) 

• Multiple sensitivity analyses changing key modelling decisions 



Further extensions 

Non-linearity in the exposure-outcome association 

• Both the exposure of interest and other time-vary- 
ing confounders might have non-linear associa- 
tions with the outcome. 

• This can be modelled as in other contexts: by using 
categorical variables, quadratic or higher order 
polynomials, flexible spline curves or piecewise 
linear 'threshold' models. 4 

Investigation of effect modifiers 

• Individual-level factors may still be effect modifiers 
(e.g. are the elderly more vulnerable to any detri- 
mental effects of ozone?). 

• This can be investigated provided it is possible to 
break down the overall outcome counts into stra- 
tum-specific counts based on the potential effect 
modifier. 

Analysis of data from multiple locations 

• Separate analysis by location (e.g. specific cities) 
can increase power and provide information on 
heterogeneity and adaptation to environmental 
exposures. 

• Patterns in location- specific effect estimates can be 
explored through techniques analogous to those 
used in meta-analysis, 15 or by modelling all the 
data in a single location- stratified model. 2 



Summary 

In this article we have outlined the key steps and com- 
plexities involved in carrying out a basic time series 
regression analysis (Box 3), and illustrated these in an 
example. Issues specific to time series regression are the 
presence of long-term and seasonal patterns, the possi- 
bility of delayed or non-linear associations between 
exposure and outcome, and the presence of autocorrela- 
tion. Aside from these, time series regression is no dif- 
ferent from regression techniques used in other areas, 
and the broad steps involved (plotting and tabulating 
the data, controlling for confounding, presenting expo- 
sure effects appropriately and model checking) will be 
familiar to epidemiologists from all disciplines. 



Supplementary Data 

Supplementary data are available at IJE online. 
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KEY MESSAGES 

• Time series regression is often used in studies attempting to quantify short-term associations of 
environmental exposures, such as air pollution, pollen, dust or weather variables, with health 
outcomes. 

• Time series data in these contexts may be analysed using Poisson regression models, with some 
extensions to deal with issues specific to time series regression, including the presence of long- 
term and seasonal patterns, the possibility of delayed or non-linear associations between exposure 
and outcome, and the presence of autocorrelation. 

• Other steps involved in carrying out a time series study (plotting and tabulating the data, controlling 
for confounding, presenting exposure effects appropriately and model checking) will be familiar to 
epidemiologists from all disciplines. 
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